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We demonstrate that our recently introduced stochastic Hebb-like learning rule Q is capable of 
learning the problem of timing in general network topologies generated by an algorithm of Watts and 
Strogatz |20|l . We compare our results with a learning rule proposed by Bak and Chialvo 0,3 ''■nd 
obtain not only a significantly better convergence behavior but also a dependence of the presentation 
order of the patterns to be learned by introduction of an additional degree of freedom which allows 
the neural network to select the next pattern itself whereas the learning rule of Bak and Chialvo 
stays uneffected. This dependence offers a bidirectional communication between a neuronal and a 
behavioural level and hence completes the action-perception-cycle which is a characteristics of any 
living being with a brain. 
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I. INTRODUCTION 

One of the most fascinating complex adaptive systems 
in nature is the brain. Despite its relatively simple basic 
units the neurons the cooperative behaviour of the in- 
terconnected neurons and their functional implications 
are only poorly understood. The problem in investi- 
gating this system is not only its complexity, because 
e^. the human brain consists of about 10^^ neurons 
P, but also its characteristic cycle structure which is 
known as action-perception-cycle. The difficulty with the 
action-perception-cycle, which was already known to von 
Uexkiill in 1928 ly|, is that a closed formulation of the 
problem has to include a coupled description of the brain 
and the environment because the actions of an animal are 
transformed by the environment to perceptions which are 
transformed by the brain to actions and so on. From this 
it is also clear that neither the perceptions nor the actions 
occurring in the system are randomly generated. 

In this paper we address the question: How is the 
learning dynamics of a neural network affected by differ- 
ent mechanisms for the selection of an action? Because 
learning in neural networks is modulated by a learning 
rule for the modification of the synaptic weights one can 
ask more precisely, if the learning rule itself is concerned 
by the action-selection mechanism. 

We approach this problem by comparing two different 
biologically motivated learning rules for neural networks. 
The first was proposed by Bak and Chialvo 0, 0| and 
combines experimental findings of Frey and Morris 
about synaptic tagging with a global reinforcement signal 
which can be interpreted as a dopamin signal e.g . as in 
the experiments of Otmakhova and Lisman [l4| . The 
second was introduced by the author Q and extends 
the ingredients above by the results of Fitzsimonds Q 
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about heterosynaptic long-term depression (LTD) which 
can be qualitatively explained by our stochastic learning 
rule. Both learning rules are local in the sense that the 
information, which is used for the synaptic modification, 
is only provided by the neurons which enclose the synapse 
and hence can be interpreted as extentions to the classical 
Hebbian learning rule |lfl| |. 

As problem to be learned we choose the problem of tim- 
ing, e.g. catching a ball, in a recurrent network topology 
which is generated by an algorithm of Watts and Strogatz 
|20l |. This network class was chosen because the topol- 
ogy is generated in dependence of one parameter, the so 
called rewiring parameter, and allows to convert a reg- 
ularly connected network continously in a random one. 
Recently of special interest was the regime between these 
two extrema, called small world networks, which could be 
brought in contact with experimental results about the 
neuroanatomic structure 0, 0, [23| ■ 

This paper is organized as follows. In section ^ we 
define our model. Section Hill demonstrates the practical 
working mechanism exemplified in learning the problem 
of timing in a recurrent neural network. We compare 
the learning behavior of our learning rule 0, 0| with the 
learning rule of Chialvo and Bak [3) in dependence 
of two different action-selection-mechanisms. The paper 
ends in section Hvl with conclusions and an prospect on 
future work. 



II. THE MODEL 

If one wants to investigate the learning dynamic of a 
neural network one has to define every item of table J] 
which characterizes the entire system. Metaphorically 
the points 1. to 4. define the brain of an animal. Because 
of our simplified description we call this the Toy-Brain- 
Model (TBM). The concrete definition for each part are 
as follows. 

1.) Neuron model: Binary neurons Xi G |0, 1} with 
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1. neuron model 

2. topology of the neural network 

3. network dynamics 

4. learning rule 

5. environment 

6. interaction of the TBM with the environment 
TABLE L Characterization of the entire system 

i € {1, . . . , N}. 2.) Topology of the neural network: 
The neural network is given by the construction algo- 
rithm of Watts and Strogatz |0| with N = 200 neu- 
rons and A; = 10 synapses of each neuron in dependence 
of a rewiring parameter prw which regulates the disor- 
der in the network whereas Prw = corresponds to a 
regularly and prw = 1-0 to a randomly connected net- 
work. We choose the construction algorithm of Watts 
and Strogatz "2^ because real brains are neither regu- 
larly nor randomly connected networks but somewhat in 
between. Experimental results about the neuroanatom- 
ical structure concer ning cortico-cortical connections in 
the macaque and ca tjld lr^ as well as neuro-neuro cop- 
plings in C. elegans [l| indicate that there is a parameter 
range of the rewiring parameter p™, which is compatible 
with these experimental findings Qi HSl • 

3. ) Network dynamics (winner-take- all): The inner 
field of the neurons is calculated by 

all 
i 

(2) 

Here "all" indicates that the summation is carried out 
over all connected neurons. From the obtained inner 
fields hj we select the biggest one 

«max = argmax(/ij) (3) 

i 

and set the corresponding neuron activity to one and the 
remaining ones to zero. 

X, = ( i' * " (4) 

For this the network dynamics is called winner-take-all 
mechanism because only the neuron with the highest in- 
ner field becomes activated. This kind of network dy- 
namic correpondes in a biological terminus to lateral in- 
hibition. 

4. ) Learning rule: We choose two different learning 
rules to adjust the synaptic weights of the neural network 
and compare them in the result section. 4. a) The learning 
rule of Chialvo and Bak 0, Q depresses the weights of 
the active synapses 

Wij^wlj = Wij - S, (5) 

with 5 S [0,1], only if the output of the network was 
wrong indicated by the reinforcement signal r = — 1 



which is democratically fed back to all synapses in the 
network. The synapses are called active if they were in- 
volved in the last signal processing. With the notation 
introduced by Klemm, Bornholdt and Schuster this 
can also be expressed by 6 = and (5 G [0,1] whereas 
is a synaptic counter of a certain length which stores 
the past reinforcement signals. 

4. b]Stochastic Hebb-like learning rule: We introduced 
in HQ a novel stochastic learning rule and present here 
a simplified version which is a special case of with one 
degree of freedom less. 

Similar to 0, only active synapses Wij can be up- 
dated if r = — 1 which corresponds to a wrong network 
output. But now a synapse is updated with the probabil- 
ity p—""^ which is given by El Then the synaptic weights 
are depressed by 

Wij^w'ij = Wij - S, (6) 

with S G [0,1]. 

The stochastic update condition is based on neuron 
counters Ci assigned to all neurons whose dynamics is 
given by 

{9, z/ Ci - r > 9 
o,-r,if 9>Q -r>0 (7) 
0,if > c, - r. 

Here 9 G IN is the memory length of the neuron counters 
and r = ±1 a reinforcement signal. Equation [T] concerns 
only the active neurons. The other neuron counters re- 
main unchanged. 

The probability p—"*^ of the stochastic update condi- 
tion is obtained by the evaluation of the following proce- 
dure: 

1. Calculate the approximated synaptic counters afj 
of the active synapses by the neuron counters 

cTj ^ Ct + Cj (8) 

2. Because of q G M holds for aU i G {l,...,iV} ^ 
Cij G IN one can calculate for each active synapse 
an approximated synapse counter and by this one 
can assign a probability p—"*^, which is given by the 
rank ordering distribution 

P^'^""^ cx k-'' (9) 
A:G{1,...,29 + 3} (10) 
TGE.+ (11) 

with the mapping k — 29 + 3 ~ cfj, motivated by 
i 

For the following simulations we used a neuron memory 
of length 9 = 3 and chose the exponent of the rank 
ordering distribution to r = 2.0. 

5. ) Environment: The problem to be learned is a map- 
ping from input neurons to output neurons. As input 
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(output) neurons we define X5i^jn~i)+i (a;ioo+(m-i)5) for 
m G M patterns. The mapping consists in a connection 
from input neuron X5(,„_i)+i via inter neurons to output 
neuron a;ioo+(m-i)5 i^i exactly Tc = 4 time steps. Ar- 
riving sooner or later at the predefined output neuron is 
assumed as wrong network output. For this we call the 
problem to be learned timing. 

The difficulty of the problem to be learned is the recur- 
rent topology of the network. In contrast to multilayer 
feedforward neural networks where the output neurons 
in the last layer are always reached after flayer time 
steps this is not the case for recurrent networks. Hence 
the problem is not only to reach the predefined output 
neuron but to reach it exactly after a predefined number 
of time steps. 

A similarproblem has been studied in a series of pa- 
pers by 0j El 18]- But in contrast they used a random 
topology of the neural network and learned a mapping 
within a predefined time which is easier because, e.g. di- 
rect connections from input to output neurons are not 
forbidden. However, this is for two reasons not desir- 
able. First, the brain of animals is divided in different 
areas which are specialized to certain performances, e.g. 
sensor or motor cortex which correspond to our input 
and output neurons. However, between these parts there 
is no direct connection. They are connected via inter 
neurons which are themselves parts of other specialized 
areas, e.g. the hippocampus for the consolidation of the 
memory. Second, there are natural problems an animal 
is faced which can only be solved by correct timing of 
the animal's motor action. E.g. monkeys have to catch 
a branch to the right time to prevent them from falling 
from the tree. 

6.) Interaction of the TBM with the environment: For 
the interaction of the TBM with the environment we 
choose two different strategies which are compared in the 
result section. 6. a) action-selection-mechanism (ASM) 

I. : The patterns are independently presented with equal 
probability. 6.b) action-selection mechanism (ASM) II.: 
In this case the TBM is equipped with an attribute which 
allows to choose one of the numbered M patterns ex- 
plicitely. Moreover, this action (pattern) selection mech- 
anism has a memory of length M to store the results of 
the last M outcomes. Initially pattern 1 is chosen by the 
action-selection-mechanism of the TBM as long as the 
mapping is learned which is signed by the reinforcement 
signal r = I. Then the next pattern with number 2 is se- 
lected and the procedure is repeated until both mappings 
are learned. For this we need the memory to store the 
last outcomes. This procedure goes on until all M pat- 
terns are learned. We emphasize that the patterns are al- 
ways sequentially presented according to their number. A 
metaphorical visualization of action-selection-mechanism 

II. can be given as a possible strategy of learning words of 
a foreign language. If one wants to learn a certain number 
of words one would not randomly choose but selectively. 
Which strategy is the best for oneself is individually dif- 
ferent but to go on in learning first if one learned some 



words correctly seems to be very appealing. 

The notations "environment" and "action-selection- 
mechanism" were chosen to indicate that we are trying 
to describe a simple but natural situation in which an an- 
imal interacts with its environment to solve some prob- 
lem which it faces. Hence the interactions with the en- 
vironment are not random but based on the preceding 
experience which is accumulated in the brain. So action- 
selection-mechanism I. looks from a mathematical point 
of view naturally but it is completely unconditioned from 
the state of the neural network and subsequently from 
the information which was gathered. Action-selection- 
mechanism II. is a first step to connect the neural net- 
work with the presentation statistic of the patterns which 
has to be generated in a self-organized way by the ani- 
mal itself. The cyclic connection from the perception of 
a stimulus to the selection of an action is called action- 
perception-cycle and is a characteristic of all living be- 
ings. 



III. RESULTS 

In this section we present the results for the model 
defined in section ^1 More exactly, we want to investi- 
gate the learning behavior of the neural network in de- 
pendence of the rewiring parameter prw of the network 
topology, the action-selection-mechanism and the used 
learning rule to adapt the synaptic weights. The ques- 
tion that naturally arises now is, how to evaluate the 
performance of the network? This is no trivial question 
because there is no explicit costfunction defined in our 
model which is minimized during the learning process, 
e.g. in learning by back-propagation ^15, ,21.] in artificial 
neural networks. Instead we use a rule-based adapta- 
tion mechanism in form of Hebb-like learning rules 4. a) 
and 4.b) which are from a biological point of view plau- 
sible. To overcome this problem we introduce virtually 
an outer observer which observes the entire system and 
hence possesses any information occurring in the system. 
Mathematically this is done by an identical copy of the 
entire system in table However, with S — which pre- 
vents further learning during the evaluation procedure. 
The patterns can then be presented in an arbitrary order 
because there are no correlations between them and we 
determine each time step 

, , #o/ patterns learned up to time step t 

£'(i)iabs = (12) 

This is the individual absolute error (iabs) of one network 
at time point t. Individual indicates that this measure 
is up to now not averaged over an ensemble simulation. 
Because the synapses of the network are randomly initial- 
ized and the synaptic alterations 6 are chosen randomly 
from [0, 1], E{t)iahs is a stochastic process for which the 
first passage time TppT, when -B(i)iabs reaches for the 
first time zero, is a well defined random variable. 
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We choose the first passage time TppT at the thresh- 
old i^iabs = and its distribution p^^"^ to evaluate the 
performance of an ensemble of networks. From the dis- 
tribution p^^"^ one can derive two quantitative measures. 
First, the mean first-passage time < TppT > given by 



< TpPT > = 



PPT 



(13) 



We omit the indices for the value of the threshold because 
we only investigate the case -Eiabs = 0. Second, a mea- 
sure for the speed of the convergence, the distribution 
function. 



t'=Q 



(14) 



The distribution function P^{t) is restricted between 
and 1 and indicates the percentage of the networks which 
did learn the mapping of all patterns up to time point t. 

Figure n shows exemplary the distribution p^^'^ of the 
first-passage times. The shape of p^^'^ is characteristic 
and reflects by a long tail that some networks need much 
more time to learn the mapping then others. 
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FIG. 1: Distribution p^P^ of the first -passage times for learn- 
ing rule 4.b), rewiring parameter prw ~ 1.0 and AI = 3 pat- 
terns, generated by ASM I. The histogram, with bin width 
500, was generated by simulations over an ensemble of size 
N = 1000. The inner figure is a magnification of the first 
20000 time steps. 



neurons. Hence there is no path which connects the input 
with the output neurons within Tc = 4 time steps. If one 
increases Prw there exists more and more such shortcuts 
and the problem can be learned more easily. For prw > 
the problem consists not only in finding connecting paths 
between input and output neurons but also in preserving 
paths for already correctly learned mappings. This in- 
terplay between path exploration and path conservation 
makes the problem hard especially for low values of the 
rewiring parameter. 
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FIG. 2: Distribution function P'^-°{t) for M = 3 and learning 
rule 4. a) (dash-dot line) and 4.b) (full line) which were ob- 
tained for ASM I. The curves are parametriesed from above 
to below from prw ~ 1.0 to prw = 0.6 (upper figure) and from 
Prw = 0.5 to Prw =0.1 (lower figure). 



A. Action-selection-mechanism I. 

M = 3 patterns: 

Figure m shows the distribution function P^{t) for learn- 
ing rule 4. a) (dotted lines) and 4.b) (full lines) in de- 
pendence of the rewiring parameter Prw It is clear to 
recognize that the convergence behavior for learning rule 
4.b) is always significantly better. In general holds the 
less Prw becomes the longer it takes to converge. This is 
due to the fact that for prw = the network is regularly 
connected without any shortcuts between further remote 



Table ^ gives in the second columns how many per- 
centage of the ensemble could learn the problem within 
the simulation time of T = 10^ time steps. One can see 
that also in this category learning rule 4.b) is better then 
4. a) because the convergence percentage is alway greater 
in any parameter region of Prw However, for < 0.3 
even learning with rule 4.b) is not perfect. 

M = 5 patterns: 
In figure 121 the corresponding results for M = 5 patterns 
are shown. Here the effects mentioned above are fur- 
ther increased by increasing the number of patterns to be 
learned. This results in an almost complete break down 
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TABLE II: Ensemble sizes used for the corresponding sim- 
ulations and the percentage of networks which learned the 
mapping after T = 10^ time steps correctly (ensemble 
size/percentage). LR means learning rule. The action- 
selection-mechanism which was used in these simulations was 
ASM I. 
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for learning rule 4. a) which is now only able to learn the 
problem for pi-w — {0.9, 1.0} for a few networks. Learning 
rule 4.b) works much better also in this case. However, 
learning within T = 10^ time steps is always incomplete. 
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FIG. 3: Distribution function P^=°{t) for M = 5 and learn- 
ing rule 4. a) (dash-dot line) and 4.b) (full line) which were 
obtained for ASM I. The curves for learning rule 4.b) are 
parametriesed from above to below from prw ~ 1-0 to prw ~ 
0.1 and for learning rule 4. a) from prw ~ 1.0 to prw = 0.9. 



B. Action-selection-mechanism II. 

The results for action-selection-mechanism IL and 
M = 3 respectively M = 5 patterns are summarized 
in table IIIII One recognizes by comparison with table 
nil that the overall results are confirmed. Learning rule 
4.b) obtains always significantly better results than 4. a). 



TABLE III: Ensemble sizes used for the corresponding sim- 
ulations and the percentage of networks which learned the 
mapping after T = lO'^ time steps correctly (ensemble 
size/percentage). LR means learning rule. The action- 
selection-mechanism which was used in these simulations was 
ASM II. 
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Moreover, a direct comparison between the learning rules 
for ASM I. and II. reveals that learning rule 4. a) seems to 
be unaffected by the action-selection mechanism whereas 
learning rule 4.b) is clearly influenced. 

C. Comparison of ASM I. and ASM II. 

To quantify the dependence of the learning behavior 
of the action-selection mechanism we calculate the mean 
first-passage time < Tfpt > from the simulation results 
obtained so far. Figure 0] compares the results for learn- 
ing rule 4. a) and 4.b) in dependence of the rewiring pa- 
rameter Prw and the patterns to be learned. One can 
clearly see that the mean first-passage time for learning 
rule 4.b) (upper (lower) two curves correspond to M = 5 
(Af = 3)) is significantly reduced for ASM II. (full lines) 
whereas the results for learning rule 4. a) are not affected 
(middle curves correspond to M = 3). 

This can be explained by the different structure of both 
learning rules. Learning rule 4. a) possesses no memory 
with respect to the outcomings of past results but only a 
tagging mechanism for the neurons which were involved 
in the last signal processing step. Hence it can not detect 
the differences of the two action-selection mechanisms be- 
cause they differ only in the order of the presented pat- 
terns but not in the overall presentation statistics. This 
follows from the fact that learning the last pattern takes 
about 90% of the first-passage time. Learning rule 4.b) 
is due to the neuron counters q different in this point. 
The neuron counters are a memory for the outcomings of 
the past results and thus can detect the slight difference 
in the two action-selection mechanisms. 

We think that this result is worth to be discussed in 
detail because it reveals some deep characteristics of ani- 
mals which is normally neglected in investigations of neu- 
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FIG. 4: Mean first-passage time < Tfpt > in dependence of 
tlie rewiring parameter pr„. Full lines correspond to simula- 
tions with ASM II. dashed-dotted lines to ASM I. The two 
upper (lower) curves are obtained by learning rule 4.b) and 
M = 5 (M = 3) patterns. The two middle curves by learn- 
ing rule 4a) and M = 3 patterns. "*" indicates that 100% 
of the ensemble converged within the simulation time of 10^ 
time steps whereas "□" indicates that this is not the case. 
This implies that the obtained results for < TppT > are only 
estimations. 

ral networks. The consequences of the results obtained 
above are not only that the learning rule of a neural net- 
work effects on the neural activity by synaptic changes 
and hence on the behavior of an animal which is com- 
mon sense, but also that the reverse holds. That means 
the actions of an animal influence the learning rule of 
its neural network. This is caused by the stimuli gen- 
erated by the animal's actions which are represented in 
the examples above as patterns which lead to a modula- 
tion of the neural activity in the network and hence to a 
modulation of the learning rule due to memory effects by 
the neuron counters. This seems to be plausible because 
we do not choose our actions randomly but we choose 
them to learn something as fast as possible to survive. 
Moreover, it is not only plausible but also efficient to us 
the action-selection mechanism as source of information 
which is shown in figure 0| 

Hence our investigations lead not only to a bottom- 
up communication but also to a top-down communica- 
tion between different system levels. In this respect our 
learning rule with neuron counters is different to all other 
Hebb-like learning rules which has be en p roposed as ex- 
tentions to the classical Hebbian rule llQl which lack the 



ability of a memory because they can not be affected by 
action-selection mechanisms which differ not in the pre- 
sentation statistics but only in the presentation order. 



IV. CONCLUSIONS 

In this article we investigated the properties of our 
recently proposed stochastic Hebb-like learning rule for 
neural networks. We demonstrated by extensive numeri- 
cal simulations that the problem of timing can be learned 
in different topologies of a neural network generated by 
the algorithm of Watts and Strogatz [23|. A compari- 
son with the learning rule of Chialvo and Bak IH, 0| gave 
not only always significantly better results but revealed 
that our stochastic Hebb-like learning rule can discrimi- 
nate between different action-selection mechanisms with 
the same presentation statistics but different presenta- 
tion order. This difference forms a source of information 
and can positively effect the learning behavior due to 
the bidirectional communication between different sys- 
tem levels. This effect was only recognized because we 
did not want to model the brain of an animal but its 
action-perception-cycle schematically depicted in table |2 
where the brain is only one part of the entire system. 

In summary our stochastic Hebb-like learning rule is 
not only universal applicable in feedforward multilayer 
networks ^ but also in a class of recurrent networks gen- 
erated by ^20] as demonstrated in this article. Together 
with its biological interpretation as qualitative form of 
heterosynaptic plasticity |^ Q and its sensitivity to the 
presentation order of the patterns to be learned we belief 
that our learning rule unites some crucial ingredients on 
the way of our understanding of the action-perception- 
cycle and hence of the brain. We belief that only such 
an integrated ansatz can explain the functional working 
method of the entire system because its parts are coupled 
in a nonlinear or stochastic way. 
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